23 research outputs found

    A Speaker Diarization System for Studying Peer-Led Team Learning Groups

    Full text link
    Peer-led team learning (PLTL) is a model for teaching STEM courses where small student groups meet periodically to collaboratively discuss coursework. Automatic analysis of PLTL sessions would help education researchers to get insight into how learning outcomes are impacted by individual participation, group behavior, team dynamics, etc.. Towards this, speech and language technology can help, and speaker diarization technology will lay the foundation for analysis. In this study, a new corpus is established called CRSS-PLTL, that contains speech data from 5 PLTL teams over a semester (10 sessions per team with 5-to-8 participants in each team). In CRSS-PLTL, every participant wears a LENA device (portable audio recorder) that provides multiple audio recordings of the event. Our proposed solution is unsupervised and contains a new online speaker change detection algorithm, termed G 3 algorithm in conjunction with Hausdorff-distance based clustering to provide improved detection accuracy. Additionally, we also exploit cross channel information to refine our diarization hypothesis. The proposed system provides good improvements in diarization error rate (DER) over the baseline LIUM system. We also present higher level analysis such as the number of conversational turns taken in a session, and speaking-time duration (participation) for each speaker.Comment: 5 Pages, 2 Figures, 2 Tables, Proceedings of INTERSPEECH 2016, San Francisco, US

    FEARLESS STEPS Challenge (FS-2): Supervised Learning with Massive Naturalistic Apollo Data

    Full text link
    The Fearless Steps Initiative by UTDallas-CRSS led to the digitization, recovery, and diarization of 19,000 hours of original analog audio data, as well as the development of algorithms to extract meaningful information from this multi-channel naturalistic data resource. The 2020 FEARLESS STEPS (FS-2) Challenge is the second annual challenge held for the Speech and Language Technology community to motivate supervised learning algorithm development for multi-party and multi-stream naturalistic audio. In this paper, we present an overview of the challenge sub-tasks, data, performance metrics, and lessons learned from Phase-2 of the Fearless Steps Challenge (FS-2). We present advancements made in FS-2 through extensive community outreach and feedback. We describe innovations in the challenge corpus development, and present revised baseline results. We finally discuss the challenge outcome and general trends in system development across both phases (Phase FS-1 Unsupervised, and Phase FS-2 Supervised) of the challenge, and its continuation into multi-channel challenge tasks for the upcoming Fearless Steps Challenge Phase-3.Comment: Paper Accepted in the Interspeech 2020 Conferenc

    Novel statistical voice activity detectors

    Get PDF
    In this thesis, we propose a few practical statistical voice activity detectors (VADs) which combine the voice activity information in the short-term and long-term statistics of the speech signal. Unlike most VADs, which assume that the cues to activity lie within the frame alone, the proposed VAD schemes seek information for activity in the current as well as the neighboring frames. Particularly, we develop primary and contextual detectors to process the short-term and long-term information, respectively. We use the perceptual Ephraim-Malah (PEM) model to develop three primary detectors based on the Bayesian, Neyman-Pearson (NP) and competitive NP (CNP) approaches. Moreover, upon viewing voice activity detection as a composite hypothesis where the prior signal-to-noise ratio (SNR) forms the free parameter, we reveal that a correlation between the prior SNR and the hypothesis exists, i.e., a high prior SNR is more likely to be associated with 'speech hypothesis' than the 'pause hypothesis' and vice-versa, and unlike the Bayesian and NP approaches, the CNP approach alone exploits this correlatio

    Automatic language analysis and identification based on speech production knowledge

    No full text
    In this paper, a language analysis and classification system that lever-ages knowledge of speech production is proposed. The proposed scheme automatically extracts key production traits (or “hot-spots”) that are strongly tied to the underlying language structure. Particu-larly, the speech utterance is first parsed into consonant and vowel clusters. Subsequently, the production traits for each cluster is rep-resented by the corresponding temporal evolution of speech articu-latory states. It is hypothesized that a selection of these production traits are strongly tied to the underlying language, and can be ex-ploited for language ID. The new scheme is evaluated on our South Indian Languages (SInL) corpus which consists of 5 closely related languages spoken in India, namely, Kannada, Tamil, Telegu, Malay-alam, and Marathi. Good accuracy is achieved with a rate of 65% obtained in a difficult 5-way classification task with about 4sec of train and test speech data per utterance. Furthermore, the proposed scheme is also able to automatically identify key production traits of each language (e.g., dominant vowels, stop-consonants, fricatives etc.)
    corecore